Search CORE

163 research outputs found

NAS-VAD: Neural Architecture Search for Voice Activity Detection

Author: Ko Jong Hwan
Park Jinhyeok
Rho Daniel
Publication venue: 'International Speech Communication Association'
Publication date: 29/03/2022
Field of study

Various neural network-based approaches have been proposed for more robust and accurate voice activity detection (VAD). Manual design of such neural architectures is an error-prone and time-consuming process, which prompted the development of neural architecture search (NAS) that automatically design and optimize network architectures. While NAS has been successfully applied to improve performance in a variety of tasks, it has not yet been exploited in the VAD domain. In this paper, we present the first work that utilizes NAS approaches on the VAD task. To effectively search architectures for the VAD task, we propose a modified macro structure and a new search space with a much broader range of operations that includes attention operations. The results show that the network structures found by the propose NAS framework outperform previous manually designed state-of-the-art VAD models in various noise-added and real-world-recorded datasets. We also show that the architectures searched on a particular dataset achieve improved generalization performance on unseen audio datasets. Our code and models are available at https://github.com/daniel03c1/NAS_VAD.Comment: Submitted to Interspeech 202

arXiv.org e-Print Archive

Neural Residual Flow Fields for Efficient Video Representations

Author: Cho Junwoo
Ko Jong Hwan
Park Eunbyung
Rho Daniel
Publication venue
Publication date: 05/10/2022
Field of study

Neural fields have emerged as a powerful paradigm for representing various signals, including videos. However, research on improving the parameter efficiency of neural fields is still in its early stages. Even though neural fields that map coordinates to colors can be used to encode video signals, this scheme does not exploit the spatial and temporal redundancy of video signals. Inspired by standard video compression algorithms, we propose a neural field architecture for representing and compressing videos that deliberately removes data redundancy through the use of motion information across video frames. Maintaining motion information, which is typically smoother and less complex than color signals, requires a far fewer number of parameters. Furthermore, reusing color values through motion information further improves the network parameter efficiency. In addition, we suggest using more than one reference frame for video frame reconstruction and separate networks, one for optical flows and the other for residuals. Experimental results have shown that the proposed method outperforms the baseline methods by a significant margin. The code is available in https://github.com/daniel03c1/eff_video_representationComment: Accepted for ACCV 2022, codes are available at https://github.com/daniel03c1/eff_video_representatio

arXiv.org e-Print Archive

Understanding Contrastive Learning Through the Lens of Margins

Author: Kim TaeSoo
Park JaeHan
Park Jaehyun
Park Sooill
Rho Daniel
Publication venue
Publication date: 10/10/2023
Field of study

Contrastive learning, along with its variations, has been a highly effective self-supervised learning method across diverse domains. Contrastive learning measures the distance between representations using cosine similarity and uses cross-entropy for representation learning. Within the same framework of cosine-similarity-based representation learning, margins have played a significant role in enhancing face and speaker recognition tasks. Interestingly, despite the shared reliance on the same similarity metrics and objective functions, contrastive learning has not actively adopted margins. Furthermore, decision-boundary-based explanations are the only ones that have been used to explain the effect of margins in contrastive learning. In this work, we propose a new perspective to understand the role of margins based on gradient analysis. Based on the new perspective, we analyze how margins affect gradients of contrastive learning and separate the effect into more elemental levels. We separately analyze each and provide possible directions for improving contrastive learning. Our experimental results demonstrate that emphasizing positive samples and scaling gradients depending on positive sample angles and logits are the keys to improving the generalization performance of contrastive learning in both seen and unseen datasets, and other factors can only marginally improve performance

arXiv.org e-Print Archive

Recommended from our members

How Do Foreign Accents Impact Perception and Credibility?

Author: Park Lindsay
Rho Daniel
Sethi Shyam
Vasquez Areli
Worley Taylor C.
Publication venue: eScholarship, University of California
Publication date: 01/05/2020
Field of study

The paper aims to investigate how foreign accents impact perception and credibility by looking at various experiments that the researchers have conducted. To observe the effects that foreign accents have on listeners, we outlined three critical areas: visual and auditory stimuli, subtitle comprehension, and perception. By having an in-group or native accent as our control group, we were able to evaluate how various accents, such as Dutch and German, have a subtle impact on the accuracy of the speakers rated and measured by the participants. Based on our analysis, foreign-accented speakers are perceived to be less credible. In addition, it was concluded that perception also plays a key role in the day-to-day life of non-native speakers. While more research would be beneficial, it is clear that foreign accents reduce the speakers’ credibility and should be considered in environments such as job interviews and other social settings

eScholarship - University of California

Hexa: Self-Improving for Knowledge-Grounded Dialogue System

Author: Han Gunsoo
Jo Daejin
Kim Sungwoong
Kwon Taehwan
Nam Daniel Wontae
On Kyoung-Woon
Rho Seungeun
Publication venue
Publication date: 22/10/2023
Field of study

A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the generative performances of intermediate steps without the ground truth data. In particular, we propose a novel bootstrapping scheme with a guided prompt and a modified loss function to enhance the diversity of appropriate self-generated responses. Through experiments on various benchmark datasets, we empirically demonstrate that our method successfully leverages a self-improving mechanism in generating intermediate and final responses and improves the performances on the task of knowledge-grounded dialogue generation

arXiv.org e-Print Archive

Highly Clumpy Structure of the Thermal Composite Supernova Remnant 3C391 Unveiled by Chandra

Author: Chen
Cox
Frail
Hwang
Hwang
Moffett
Patrick O. Slane
Petruk
Q. Daniel Wang
Reach
Reach
Rho
White
Wilner
Yang Chen
Yang Su
Yusef-Zadeh
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

The nature of the internal thermal X-ray emission seen in ``thermal composite" supernova remnants is still uncertain. Chandra observation of the 3C391 shows a southeast-northwest elongated morphology and unveils a highly clumpy structure of the remnant. Detailed spatially resolved spectral analysis for the small-scale features reveals normal metal abundance and uniform temperature for the interior gas. The properties of the hot gas comparatively favor the cloudlet evaporation model as a main mechanism for the ``thermal composite" X-ray appearance, though radiative rim and thermal conduction may also be effective. A faint protrusion is found in Si and S lines out of the southwest radio border.Comment: 7 pages, 4 embedded figures, in COSPAR 2004 session E1.4, "Young Neutron Stars and Supernova Remnants", Advances in Space Research, in pres

arXiv.org e-Print Archive

Crossref

ScholarWorks@UMass Amherst

CERN Document Server

Prospects for Pentaquark Production at Meson Factories

Author: Abe
Abe
Adkins
Aitala
Aitala
Aleev
Alt
Anderson
Armstrong
Arndt
Asratyan
Barmin
Barth
Bleicher
Borisyuk
Buccella
Cahn
Callan
Callan
Carlson
Casher
Chemtob
Cheung
Cohen
Cohen
Cohen
Csikor
Daniel R Marlow
Diakonov
Diakonov
Diehl
Dytman
Dzierba
Fischer
Gignoux
Hagiwara
Haidenbauer
Igor R Klebanov
Itzhaki
Jaffe
Jaffe
Jenkins
Jennings
Juengst
Kaplan
Karliner
Karliner
Karliner
Kubarovsky
Lipkin
Manohar
Nakano
Nussinov
Oh
Oh
Praszalowicz
Randrup
Rho
Riska
Rosner
Roy
Sasaki
Skyrme
Skyrme
Stancu
Stancu
Stepanyan
Strottman
Thomas E Browder
Walliser
Wang
Weigel
Witten
Witten
Publication venue: 'Elsevier BV'
Publication date: 30/03/2004
Field of study

Following Rosner [hep-ph/0312269], we consider B-decay production channels for the exotic I=0 and

I=3/2

pentaquarks that have been recently reported. We also discuss new search channels for isovector pentaquarks, such as the

\Theta^{*++} (\bar s duuu)

, that are generically present in chiral soliton models but were not observed in recent experiments. Futhermore, we argue that weak decays of charmed baryons, such as the

\Lambda_c^+

and

\Xi_c^0

, provide another clean way of detecting exotic baryons made of light quarks only. We also discuss discovery channels for charmed pentaquarks, such as the isosinglet

\Theta_c^0 (\bar c udud)

, in weak decays of bottom mesons and baryons. Finally, we discuss prospects for inclusive production of pentaquarks in

e^+ e^-

collisions, with associated production of particles carrying the opposite baryon number.Comment: 15 pages, LaTeX; v2,v3: minor corrections, references added; v4: minor modifications, the version published in Physics Letters

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

Interstellar Silicate Dust in the z=0.89 Absorber Towards PKS 1830-211: Crystalline Silicates at High Redshift?

Author: Aguirre
Aguirre
Ardila
Bernard-Salas
Bernstein
Boissé
Bottinelli
Bouwman
Bringa
Calzetti
Carrez
Chary
Chengalur
Chiar
Chiar
Courbin
Dai
Daniel E. Welty
Debopam Som
Djorgovski
Donald G. York
Donnarumma
Dorschner
Elbaz
Fabian
Falco
Ferraro
Frye
Gibb
Gibb
Giovanni Vladilo
Gordon
Gruendl
Hao
Harker
Higdon
Honda
Houck
Jäger
Kemper
Kemper
Kulkarni
Kulkarni
Kulkarni
Kulkarni
Lehár
Lovell
Lynch
Madau
McGough
Molster
Monique C. Aller
Mutschke
Papovich
Pei
Pettini
Press
Prochaska
Prochaska
Prochaska
Rho
Riess
Roche
Roche
Shapley
Smith
Sofia
Spoon
Stanghellini
Storrie-Lombardi
Subrahmanyan
Subrahmanyan
Sylvester
Uchida
van Diedenhoven
Varsha P. Kulkarni
Weingartner
Welty
Werner
Whittet
Whittet
Wiklind
Winn
Wooden
Publication venue: 'IOP Publishing'
Publication date: 24/01/2012
Field of study

We present evidence of a >10-sigma detection of the 10 micron silicate dust absorption feature in the spectrum of the gravitationally lensed quasar PKS 1830-211, produced by a foreground absorption system at redshift 0.886. We have examined more than 100 optical depth templates, derived from both observations of Galactic and extragalactic sources and laboratory measurements, in order to constrain the chemical structure of the silicate dust. We find that the best fit to the observed absorption profile is produced by laboratory crystalline olivine, with a corresponding peak optical depth of tau_10=0.27+/-0.05. The fit is slightly improved upon by including small contributions from additional materials such as silica, enstatite, or serpentine, which suggests that the dust composition may consist of a blend of crystalline silicates. Combining templates for amorphous and crystalline silicates, we find that the fraction of crystalline silicates needs to be at least 95%. Given the rarity of extragalactic sources with such a high degree of silicate crystallinity, we also explore the possibility that the observed spectral features are produced by amorphous silicates in combination with other molecular or atomic transitions, or by foreground source contamination. While we cannot rule out these latter possibilities, they lead to much poorer profile fits than for the crystalline olivine templates. If the presence of crystalline interstellar silicates in this distant galaxy is real, it would be highly unusual, given that the Milky Way interstellar matter contains essentially only amorphous silicates. It is possible that the z=0.886 absorber towards PKS 1830-211, well known for its high molecular content, has a unique star-forming environment that enables crystalline silicates to form and prevail.Comment: 67 pages, 21 figures, accepted for publication in the Astrophysical Journa

arXiv.org e-Print Archive

Crossref